Västerås
A Unified Framework for Real-Time Failure Handling in Robotics Using Vision-Language Models, Reactive Planner and Behavior Trees
Ahmad, Faseeh, Ismail, Hashim, Styrud, Jonathan, Stenmark, Maj, Krueger, Volker
Robotic systems often face execution failures due to unexpected obstacles, sensor errors, or environmental changes. Traditional failure recovery methods rely on predefined strategies or human intervention, making them less adaptable. This paper presents a unified failure recovery framework that combines Vision-Language Models (VLMs), a reactive planner, and Behavior Trees (BTs) to enable real-time failure handling. Our approach includes pre-execution verification, which checks for potential failures before execution, and reactive failure handling, which detects and corrects failures during execution by verifying existing BT conditions, adding missing preconditions and, when necessary, generating new skills. The framework uses a scene graph for structured environmental perception and an execution history for continuous monitoring, enabling context-aware and adaptive failure handling. We evaluate our framework through real-world experiments with an ABB YuMi robot on tasks like peg insertion, object sorting, and drawer placement, as well as in AI2-THOR simulator. Compared to using pre-execution and reactive methods separately, our approach achieves higher task success rates and greater adaptability. Ablation studies highlight the importance of VLM-based reasoning, structured scene representation, and execution history tracking for effective failure recovery in robotics.
Combating Partial Perception Deficit in Autonomous Driving with Multimodal LLM Commonsense
Hu, Yuting, Xu, Chenhui, Qin, Ruiyang, Liu, Dancheng, Nassereldine, Amir, Shi, Yiyu, Xiong, Jinjun
Partial perception deficits can compromise autonomous vehicle safety by disrupting environmental understanding. Current protocols typically respond with immediate stops or minimal-risk maneuvers, worsening traffic flow and lacking flexibility for rare driving scenarios. In this paper, we propose LLM-RCO, a framework leveraging large language models to integrate human-like driving commonsense into autonomous systems facing perception deficits. LLM-RCO features four key modules: hazard inference, short-term motion planner, action condition verifier, and safety constraint generator. These modules interact with the dynamic driving environment, enabling proactive and context-aware control actions to override the original control policy of autonomous agents. To improve safety in such challenging conditions, we construct DriveLM-Deficit, a dataset of 53,895 video clips featuring deficits of safety-critical objects, complete with annotations for LLM-based hazard inference and motion planning fine-tuning. Extensive experiments in adverse driving conditions with the CARLA simulator demonstrate that systems equipped with LLM-RCO significantly improve driving performance, highlighting its potential for enhancing autonomous driving resilience against adverse perception deficits. Our results also show that LLMs fine-tuned with DriveLM-Deficit can enable more proactive movements instead of conservative stops in the context of perception deficits.
Exploring FMCW Radars and Feature Maps for Activity Recognition: A Benchmark Study
Fard, Ali Samimi, Mashhadigholamali, Mohammadreza, Zolfaghari, Samaneh, Abedi, Hajar, Chakraborty, Mainak, Borzì, Luigi, Daneshtalab, Masoud, Shaker, George
Human Activity Recognition has gained significant attention due to its diverse applications, including ambient assisted living and remote sensing. Wearable sensor-based solutions often suffer from user discomfort and reliability issues, while video-based methods raise privacy concerns and perform poorly in low-light conditions or long ranges. This study introduces a Frequency-Modulated Continuous Wave radar-based framework for human activity recognition, leveraging a 60 GHz radar and multi-dimensional feature maps. Unlike conventional approaches that process feature maps as images, this study feeds multi-dimensional feature maps -- Range-Doppler, Range-Azimuth, and Range-Elevation -- as data vectors directly into the machine learning (SVM, MLP) and deep learning (CNN, LSTM, ConvLSTM) models, preserving the spatial and temporal structures of the data. These features were extracted from a novel dataset with seven activity classes and validated using two different validation approaches. The ConvLSTM model outperformed conventional machine learning and deep learning models, achieving an accuracy of 90.51% and an F1-score of 87.31% on cross-scene validation and an accuracy of 89.56% and an F1-score of 87.15% on leave-one-person-out cross-validation. The results highlight the approach's potential for scalable, non-intrusive, and privacy-preserving activity monitoring in real-world scenarios.
Towards Autonomous Reinforcement Learning for Real-World Robotic Manipulation with Large Language Models
Turcato, Niccolò, Iovino, Matteo, Synodinos, Aris, Libera, Alberto Dalla, Carli, Ruggero, Falco, Pietro
Recent advancements in Large Language Models (LLMs) and Visual Language Models (VLMs) have significantly impacted robotics, enabling high-level semantic motion planning applications. Reinforcement Learning (RL), a complementary paradigm, enables agents to autonomously optimize complex behaviors through interaction and reward signals. However, designing effective reward functions for RL remains challenging, especially in real-world tasks where sparse rewards are insufficient and dense rewards require elaborate design. In this work, we propose Autonomous Reinforcement learning for Complex HumanInformed Environments (ARCHIE), an unsupervised pipeline leveraging GPT-4, a pre-trained LLM, to generate reward functions directly from natural language task descriptions. The rewards are used to train RL agents in simulated environments, where we formalize the reward generation process to enhance feasibility. Additionally, GPT-4 automates the coding of task success criteria, creating a fully automated, one-shot procedure for translating human-readable text into deployable robot skills. Our approach is validated through extensive simulated experiments on single-arm and bi-manual manipulation tasks using an ABB YuMi collaborative robot, highlighting its practicality and effectiveness. Tasks are demonstrated on the real robot setup.